55 research outputs found

    Failure detectors as type boosters

    Get PDF
    The power of an object type T can be measured as the maximum number n of processes that can solve consensus using only objects of T and registers. This number, denoted cons(T), is called the consensus power of T. This paper addresses the question of the weakest failure detector to solve consensus among a number k > n of processes that communicate using shared objects of a type T with consensus power n. In other words, we seek for a failure detector that is sufficient and necessary to "boost” the consensus power of a type T from n to k. It was shown in Neiger (Proceedings of the 14th annual ACM symposium on principles of distributed computing (PODC), pp. 100-109, 1995) that a certain failure detector, denoted Ω n , is sufficient to boost the power of a type T from n to k, and it was conjectured that Ω n was also necessary. In this paper, we prove this conjecture for one-shot deterministic types. We first show that, for any one-shot deterministic type T with cons(T) ≀ n, Ω n is necessary to boost the power of T from n to n+1. Then we go a step further and show that Ω n is also the weakest to boost the power of (n+1)-ported one-shot deterministic types from n to any k > n. Our result generalizes, in a precise sense, the result of the weakest failure detector to solve consensus in asynchronous message-passing systems (Chandra etal. in J ACM 43(4):685-722, 1996). As a corollary, we show that Ω t is the weakest failure detector to boost the resilience level of a distributed shared memory system, i.e., to solve consensus among n > t processes using (t − 1)-resilient objects of consensus power

    The Failure Detector Abstraction

    Full text link
    This paper surveys the failure detector concept through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. More specifically, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlights some limitations of the failure detector abstraction along each of the dimensions

    Synchronization using failure detectors

    Get PDF
    Many important synchronization problems in distributed computing are impossible to solve (in a fault-tolerant manner) in purely asynchronous systems, where message transmission delays and relative processor speeds are unbounded. It is then natural to seek for the minimal synchrony assumptions that are sufficient to solve a given synchronization problem. A convenient way to describe synchrony assumptions is using the failure detector abstraction. In this thesis, we determine the weakest failure detectors for several fundamental problems in distributed computing: solving fault-tolerant mutual exclusion, solving non-blocking atomic commit, and boosting the synchronization power of atomic objects. We conclude the thesis by a perspective on the very definition of failure detectors

    The weakest failure detectors to boost obstruction-freedom

    Get PDF
    It is considered good practice in concurrent computing to devise shared object implementations that ensure a minimal obstruction-free progress property and delegate the task of boosting liveness to independent generic oracles called contention managers. This paper determines necessary and sufficient conditions to implement wait-free and non-blocking contention managers, i.e., contention managers that ensure wait-freedom (resp. non-blockingness) of any associated obstruction-free object implementation. The necessary conditions hold even when universal objects (like compare-and-swap) or random oracles are available in the implementation of the contention manager. On the other hand, the sufficient conditions assume only basic read/write objects, i.e., registers. We show that failure detector \lozenge{\fancyscript{P}} is the weakest to convert any obstruction-free algorithm into a wait-free one, and Ω *, a new failure detector which we introduce in this paper, and which is strictly weaker than \lozenge\fancyscript{P} but strictly stronger than Ω, is the weakest to convert any obstruction-free algorithm into a non-blocking one. We also address the issue of minimizing the overhead imposed by contention management in low contention scenarios. We propose two intermittent failure detectors IΩ∗I_{\Omega^*} and I_{\lozenge\fancyscript{P}} that are in a precise sense equivalent to, respectively, Ω * and \lozenge\fancyscript{P} , but allow for reducing the cost of failure detection in eventually synchronous systems when there is little contention. We present two contention managers: a non-blocking one and a wait-free one, that use, respectively, IΩ∗I_{\Omega^*} and I_{\lozenge\fancyscript{P}} . When there is no contention, the first induces very little overhead whereas the second induces some non-trivial overhead. We show that wait-free contention managers, unlike their non-blocking counterparts, impose an inherent non-trivial overhead even in contention-free execution

    Failure detectors as type boosters

    Get PDF
    The power of an object type T can be measured as the maximum number n of processes that can solve consensus using only objects of T and registers. This number, denoted cons(T), is called the consensus power of T. This paper addresses the question of the weakest failure detector to solve consensus among a number k > n of processes that communicate using shared objects of a type T with consensus power n. In other words, we seek for a failure detector that is sufficient and necessary to "boost" the consensus power of a type T from n to k. It was shown in Neiger (Proceedings of the 14th annual ACM symposium on principles of distributed computing (PODC), pp. 100-109, 1995) that a certain failure detector, denoted Omega (n) , is sufficient to boost the power of a type T from n to k, and it was conjectured that Omega (n) was also necessary. In this paper, we prove this conjecture for one-shot deterministic types. We first show that, for any one-shot deterministic type T with cons(T) n. Our result generalizes, in a precise sense, the result of the weakest failure detector to solve consensus in asynchronous message-passing systems (Chandra et al. in J ACM 43(4):685-722, 1996). As a corollary, we show that Omega (t) is the weakest failure detector to boost the resilience level of a distributed shared memory system, i.e., to solve consensus among n > t processes using (t - 1)-resilient objects of consensus power t

    On Failure Detectors and Type Boosters

    Get PDF
    The power of a set S of object types can be measured as the maximum number n of processes that can solve consensus using only types in S and registers. This number, denoted by h^r_m (S), is called the consensus power of S. The use of failure detectors can however ``boost'' the consensus power of types. This paper addresses the weakest failure detector type booster question, which consists in determining the weakest failure detector D such that, for any set S of types with h_m^r(S)=n, h_m^r(S;D)=n+1. We consider the failure detector \Omega_n (introduced in (Neiger,1995)) which outputs, at each process, a set of at most n processes so that, eventually, all correct processes detect the same set of processes that includes at least one correct process. We prove that \Omega_n is the weakest failure detector type booster for deterministic one-shot types. As an interesting corollary of our result, we show that \Omega_f is the weakest failure detector to boost the power of (f-1)-resilient objects solving consensus

    The Weakest Failure Detector for Non-Blocking Atomic Commit

    Get PDF
    This paper addresses the question of the weakest failure detector for solving the Non-Blocking Atomic Commit problem (NBAC) in a message passing system where processes can fail by crashing. We define a failure detector, denoted by X, which we show to be sufficient to solve NBAC with a majority of correct processes. Then we give an algorithm which, no matter how many processes may crash, uses any failure detector that solves NBAC to emulate X, i.e., we prove that \X~is also necessary for solving NBAC. The result was obtained concurrently and independently by Hadzilacos and Toueg

    The CHT Play

    Get PDF
    This note gives a high level and informal account of the necessary part of the proof that Ω is the weakest failure detector to implement consensus with a majority of correct processes. The proof originally appeared in a widely cited but rarely understood paper by Chandra, Hadzilacos and Toueg. We describe it here as a play in five acts, preceded by a prologue and followed by an epilogue

    The weakest failure detectors to solve Quittable Consensus and Non-Blocking Atomic Commit

    Get PDF
    We introduce quittable consensus, a natural variation of the consensus problem, where processes have the option to agree on “quit” if failures occur, and we relate this problem to the well-known problem of non-blocking atomic commit. We then determine the weakest failure detectors for these two problems in all environments, regardless of the number of faulty processes

    Mutual Exclusion in Asynchronous Systems with Failure Detectors

    Get PDF
    This paper defines the fault-tolerant mutual exclusion problem in a message-passing asynchronous system and determines the weakest failure detector to solve the problem. This failure detector, which we call the trusting failure detector, and which we denote by T, is strictly weaker than the perfect failure detector P but strictly stronger than the eventually perfect failure detector P. The paper shows that a majority of correct processes is necessary to solve the problem with T. Moreover, T is also the weakest failure detector to solve the fault-tolerant group mutual exclusion problem
